SharePoint Data Sources
You can add a SharePoint data source to the KB, enabling the extraction of paragraphs from Excel, Word, and PDF documents within SharePoint site libraries.
Prerequisites
- Reach out to your company's Microsoft Azure subscription administrator and ask for the following Microsoft Azure details: Tenant Id, Client Id and Client Secret.
- Your company's Microsoft Azure subscription administrator should check and ensure that the API permissions provided in the figure below are granted in Microsoft Azure.
Adding SharePoint data sources and extract data
This section describes how to add a SharePoint data source and extract paragraphs from documents stored in SharePoint libraries.
Step 1: Create the data source
- Click the Add New button. The Add New Data Source page opens.
- In the Name field, provide a name for the data source. It will help you easily identify and search for a data source.
- From the Language drop-down, select the language of the data you upload. It must be one of the bot languages.
- From the Type drop-down, select SharePoint. After you select the type, the content of the page displays additional fields required for adding a SharePoint data source.
- In the Content Location (URL) field, provide the URL of the SharePoint site from which DRUID will extract data.
- Right-click on the desired site library/folder/document in Microsoft SharePoint and select Details.
- In the right-side panel, click More details.
- Scroll down the right-side panel until you locate the Path field, and then click the copy icon next to it. This is the Content Location (URL).
- In the Source type field, select either SharePoint Online or SharePoint 2019.
- Enter the Tenant Id and Client Id.
- For SharePoint 2019, you can crawl sites and subsites from a specific library by entering the library path in the field Document Library Path. This field is available in DRUID 8.3 and higher
- You can authenticate with SharePoint Online in two ways: using a SharePoint client secret or a DRUID-generated certificate.
- Client Secret: Enter the client secret provided by your Microsoft Azure subscription administrator in the Client Secret field.
- DRUID-Signed Certificate: Select Use Certificate, then click the Create button next to the Certificate field. In the Create new certificate pop-up enter a certificate name (you will use to identify the certificate in the DRUID Portal), select the certificate expiry date from the Ending at field and click the Create button.
- Optionally, set the Min score threshold and the Target match score for the data source. If not set, the thresholds from the Knowledge Base will apply.
- To verify the SharePoint credentials, click the Test button. If the check fails, check and review the SharePoint credentials to ensure they are correct. You can also verify the SharePoint credentials later by going to the Details tab of the SharePoint data source and clicking the Test button at the bottom of the page.
- Click Create. The SharePoint data source appears on the Knowledge base page.
To find the SharePoint Content Location (URL), follow these steps:
The newly created certificate is automatically selected in the Certificate field.
Step 2. Crawl the SharePoint data source
On the Knowledge base page click on the website data source. The data source page displays by default on the Extracted paragraphs tab.
Click the Start crawling button () at the top left corner of the data source. The Start Crawling Parameters page appears.
Define the crawling policy by setting the parameters described in the table below.
Parameter | Description |
---|---|
URL | Automatically populated with Content Location (URL) you specified when adding the data source. |
Depth |
The number of directory levels the crawler will explore from the URL. Note: To improve crawling efficiency, crawl each node individually instead of the entire root, especially if the storage has a deep structure. Set the depth to '0' to achieve this.
|
After you define the crawling policy, click Start.
As the crawler visits the link provided in the Content Location (URL) field, it will identify all the hyperlinks in the retrieved web pages and will add them to the list of URLs to visit.
To crawl a specific node, click the dots next to the desired node in the file repository explorer and select Crawl Path.
Step 3. Extract the text articles
To ensure that only relevant content is captured and added to the Knowledge Base, all nodes are excluded from scrapping by default. You can extract paragraphs from the entire SharePoint library (the data source root) or from a specific library element (node / leaf).
To extract data from the SharePoint library, on the Extracted Paragraphs tab of your SharePoint data source, click the Extract button () at the top left corner of the page.
You can select the pages DRUID will extract information from during the extraction process. To include in the scrapping specific pages, click the dots next to the desired file explorer element and select Include. Subsequently, you can exclude certain pages from scrapping by clicking the dots next to the desired file explorer element and select Exclude, then click . The pages excluded from scrapping appear on the Details tab, in the Exclude from scrapping area.
To extract data only from specific tree elements (node/folder, leaf/file), select them and from the bulk action icon, select Extract selected.
If you want to extract information only from a specific tree element, click the dots next to the tree element and select Extract.
When the extraction completes, the extracted paragraphs display under Extracted paragraphs > Content tab.
Step 4. Train the data source
To ensure the KB Engine searches through the data source articles, it's crucial to train your data source. Click the Train button at the top-left corner of the data source.
Testing the data source performance
Testing the performance of a data source is important because it ensures that the extracted articles are relevant. This process helps identify and rectify any issues, improving the overall quality and effectiveness of your bot's responses. By validating the data source performance, you can enhance user satisfaction.
To test the performance of the data source, on the Extracted Paragraphs page, in the User Says area, enter a question and select the language. All matched articles will be displayed along with their scores.
You can improve the performance of the data source by reviewing and editing the articles based on your needs.
Editing paragraphs
To ensure your Knowledge Base high quality, we recommend you to review the extracted artciles and take the proper actions to improve them: open the URL from where the crawler extracted the paragraph and compare the content, edit or delete the paragraph. Refine your paragraphs by transforming unstructured data into a question-and-answer format.
To edit a paragraph, click the dots ( ) next to the paragraph and click Edit. Update the paragraph (user intent and answer) and click the Save icon at the top right corner of the page.
Fine-tuning Predictions
You can configure Advanced Settings at both the data source and node/leaf levels to achieve more precise predictions. This approach offers granular control, allowing you to adjust the extractors and trainable elements, resulting in better accuracy and performance. Unlike KB-level settings, which apply changes broadly, this targeted method adapts configurations to the unique needs of each data source or element, streamlining your authoring process.
Fine-tuning at the data source level
- Navigate to the desired data source.
- Select the Advanced Settings tab.
- Modify advanced parameters as needed and save the settings.
Fine-tuning at the node or leaf level
- In the tree explorer, select the desired node or leaf.
- On the right side, select the Advanced Settings tab.
- Modify advanced parameters as needed and save the settings.
Reset advanced settings
To reset advanced configurations at the data source and node/leaf levels to match the KB Advanced settings, go to Knowledge Base > Advanced Settings and click the Save to All button. This action streamlines your settings management by applying consistent KB Advanced settings across your entire configuration with just one click.
Enhance KB prediction
Refine your articles by transforming unstructured data into a question-and-answer format. Edit articles and add question / title / short description.
Access the Knowledge Base Advanced Settings, set the "trainableColumns" parameter to "Question,Answer", then train the Knowledge Base. The KB Engine will leverage both questions and answers from unstructured data sources during the prediction process, ultimately leading to improved prediction accuracy.
Authenticate with SharePoint by using a DRUID-generated certificate
To authenticate with SharePoint using a DRUID-generated certificate, follow these steps:
Step 1. Create certificate
Go to the Knowledge Base Advanced Settings and click on Authentication Certificates.
Click the Create Certificate button. The Create new certificate pop-up appears.
Enter a name for the certificate (you will use it to identify the certificate in the DRUID Portal), select the certificate expiry date from the Ending at field and click the Create button.
The certificate appears in the Certificates list. Download it on your computer.
Click the Save&Close button.
Step 2. Import the certificate in Azure
To import the DRUID-generated certificate in Azure, follow these steps:
- In the Azure Portal, go to Application registration > All applications > SharePoint File Discovery.
- From the left menu, click Manage and select Certificates & secrets.
- On the page, click the Certificates tab, then click Upload certificate.
- Browse for the certificate you downloaded from the DRUID Portal and select it.
Once the certificate is successfully uploaded, you can create the SharePoint data source.
Step 3. Create the SharePoint Data Source
Create the SharePoint data source following the procedure described in section Create data source with the following specific settings: tap on Use Certificate, then select the desired certificate from the Certificate field.
Authenticate with SharePoint when the DRUID-generated certificate expired
If you have a SharePoint data source that uses a DRUID-generated certificate for authentication and the certificate has expired, follow these steps:
Step 1. Delete the expired certificate and create a new one
Go to the Knowledge Base Advanced Settings and click on Authentication Certificates. In the Certificates list, click the delete icon inline with the expired certificate.
In the confirmation dialog, click Yes to confirm the certificate deletion.
Now you can create a new certificate or you can use a valid certificate.
To create a new certificate, click the Create Certificate button. The Create new certificate pop-up appears.
Enter a name for the certificate (you will use it to identify the certificate in the DRUID Portal), select the certificate expiry date from the Ending at field and click the Create button.
The certificate appears in the Certificates list. Download it on your computer.
Click the Save&Close button.
Step 3. Import the certificate in Azure
To import the DRUID-generated certificate in Azure, follow these steps:
- In the Azure Portal, go to Application registration > All applications > SharePoint File Discovery.
- From the left menu, click Manage and select Certificates & secrets.
- On the page, click the Certificates tab, then click Upload certificate.
- Browse for the certificate you downloaded from the DRUID Portal and select it.
Step 3. Select the new certificate on the SharePoint data source
From the Knowledge Base page, click on the SharePoint data source with the expired authentication certificate and click the Details tab. Choose the new certificate from the Certificate field and click Save. To verify the credentials, click the Test button.